Network Support Library

home *** CD-ROM | disk | FTP | other *** search

/ Network Support Library / RoseWare - Network Support Library.iso / 3rdparty / nostop.new < prev next >

Wrap

Text File | 1993-10-22 | 20KB | 409 lines

(C) 1989,90,91,92,93 NONSTOP NETWORKS LIMITED. ***PREPARE FOR A CRASH!*** It is inevitable that your Primary Server will someday crash. No*STOP NETWORK, of course, will keep you running on the Secondary. It does not, however, automatically provide Secondary Server login capability. You must do this yourself by creating login scripts in advance on the Secondary Server for use when your Primary Server is down. These login scripts should, in most cases, be the same as the login scripts that were on your Primary Server before you installed No*STOP NETWORK. Only the server name will be different. This ensures that users who were not on the system at the time of the crash, or those who were on the system but subsequently logged out, will be able to login to the Secondary Server and use it while the Primary Server is being repaired. [REMEMBER TO RUN RECOVERY (SEC->PRIM) AFTER THE PRIMARY HAS BEEN REPAIRED AND BEFORE MIRRORING IS RESTARTED] ************USER CONTROL*************************** OK, so you took the advice above so your users can access the Secondary Server while the Primary Server is down for repairs. So what's to keep them from logging on to the Secondary Server when the Primary Server is up? That could be a very bad situation. What is needed is a way to keep users from logging in to the Secondary unless the Primary is off-line. The example below is for users of Netware 3.x. The GOTO command is not supported for Netware 2.x. The lines in the example should be inserted at the beginning of the Secondary Server system login script. It keeps all users except the SUPERVISOR from logging in to the Secondary Server if the Primary Server is up. In this example, the SUPERVISOR is allowed in under the presumption that supervisors need to get in to do non- mirrored maintenence tasks. To lock the SUPERVISOR out as well, delete the first line. IF LOGIN_NAME == "SUPERVISOR" THEN GOTO OK2 ATTACH (primary_server_name) IF ERROR_LEVEL != "0" THEN GOTO OK FIRE PHASERS 5 TIMES WRITE "YOU CANNOT LOGIN TO (secondary_server_name)!!!" [At this point you should delete all mappings established by the system login script] EXIT OK: WRITE "OK, SINCE (primary_server_name) IS DOWN, YOU MAY USE (secondary_name)" OK2: [The normal login script begins here] We would appreciate any ideas for doing this easier or better. ***Handles Tip *** As explained in the Manual, if you use 20 Handles without us, you will use another one for every file that is open concurrently and mirrored when we are running. Also, you are advised to increase the CONFIG.SYS parameter to include the number of originals plus the number of mirrors. If, then, this number is exceeded, NOSTOP will abort the process. To avoid being aborted and let the application process the error, set the CONFIG.SYS parameter to some number less than the total number needed. If this is done, DOS will run out of handles before NOSTOP does, thus the normal DOS error will be passed to the application. When doing this, you may get our "MIRROR MISMATCH" error message. In this case, either reduce or increment the "FILES=" parameter by one and retry. Related to the above instruction: If you are getting "MIRROR MISMATCH" errors when you know there is no actual mismatch, the problem probably is caused by the "FILES=" parameter in CONFIG.SYS being set too low. ***RECOVERY OF APPLICATION PROGRAM FILES*** There are three methods for installing applications in preparation for mirroring: #1 Turn on No*STOP NETWORK and mirror the installation #2 Install on each server separately #3 Install on one server and copy application directories to the other server. This is best done using the RECOVERY utility included on the No*STOP NETWORK diskette. In almost all cases, any method will work. In the case where an application required separate installation on each server (method #2), it is likely that server-unique data and/or structures are being created during installation. In this case, straightforward use of the recovery utilities will not be adequate when recovering from a server failure. In the worst case, the application will have to be re-installed on the recovered server. It may be possible in some cases, however, to prepare in advance for recovery by performing the following steps: 1. Install on the Primary Server to subdirectory \[app]. 2. Install on the Secondary Server to subdirectory \[app]. 3. Copy from Primary:[app] to Secondary:[appB]. 4. Copy from Secondary:[app] to Primary:[appB]. 5. When a server fails, run the recovery utilities. 6. Copy from "GOOD" Server:[appB] to "BAD" Server:[app]. 7. And, to prepare for the next failure, 8. Copy from "GOOD" Server:[app] to "BAD" Server:[appB]. If this works, it will make re-installation unnecessary. ***INCOMPATIBILITIES*** Until further notice, be advised that we are incompatible with the following software: LOTUS MAGELLAN - Total incompatibility. WINDOWS SMARTDRIVE - When using the WINDOWS version of SMARTDRIVE (3-10-92), you may experience problems when a drive goes down, which can interfere with the continuous processing functions of No*STOP NETWORK. There is no problem with DOS 5 or 6 SMARTDRIVE. MS-DOS 6.0 - When running WINDOWS (3.1 or WFW) under DOS 6.0, your workstation may hang if you exit WINDOWS directly after a server has failed. DOS 6 passed all of the other tests in our validation suite with flying colors. Stay tuned. Update: We have found that the Novell patches in DOSUP7 and WINUP7 make the situation worse - the workstation hangs immediately after downing the server. Update: One of our validation tests hangs, EVEN UNMIRRORED, when DBLSPACE is active. The test is a COBOL program running on LANTASTIC 5.0. The error message is: "PSLINEHF segment RT: Error 198 @ COBOL PC086D" The test runs perfectly, mirrored or unmirrored, when DBLSPACE is not active. ******************************************************************** HERE ARE SOME ADDITIONS TO THE NEXT VERSION OF THE MANUAL WHICH YOU MAY FIND USEFUL. ******************************************************************** 3.5 Avoid Split Network Danger Used incorrectly, server mirroring can introduce a danger not present in an un-mirrored environment - data can be corrupted when a split network is created due to the failure or partial inaccessibility of a server. This section will describe how a split network occurs and will define the actions necessary to avoid data corruption. 3.5.1 Local Area Network Cabling Topology In a linear, "daisy-chain", network the potential exists for what is called a "split network". A split network is created when servers which are not topologically next to each other are cut off from each other by the failure of a link in the network, such that some workstations can access one of the servers and other workstations can access the other server, one or more workstations being unable to access both servers. When mirroring servers, a split network condition can have the result that some workstations are updating the data base on one server, while other workstations are updating the data base on the other server. This causes the versions of the data base on the two servers to diverge. If the link is re-established and the servers are re-synchronized by using No*STOP RECOVERY (or any other similar method), the updates performed on one of the servers will be lost. For this reason, attention should be given to the logical topology of your network when mirroring servers. There are two simple rules to follow for daisy chain networks: o ALWAYS LOCATE YOUR SERVERS NEXT TO EACH OTHER TOPOLOGICALLY, o ALWAYS LOCATE YOUR SERVERS AT THE END OF THE CHAIN (either end will do). [sorry, figures unavailable] Figure 1 illustrates a good topology: A. The link between servers 1 and 2 is broken. Both workstations continue to acces server2. B. The link between workstations 1 and 2 is broken: Workstation1 continues to access both servers. Workstation2 can access neither server. C. The link between the servers and the workstations is broken: Neither workstation can access either server. In all cases, the integrity of the data base is uncompromised. Figure 2 illustrates a bad topology: D. The link between server1 and workstation1 is broken: Both workstations continue to access server2. E. The link between workstation2 and server2 is broken: Both workstations continue to access server1. F. The link between workstations 1 and 2 is broken: Workstation1 continues to access server1. Workstation2 continues to access server2. In cases D and E, the integrity of the data base is maintained. In case F, the two servers will develop different versions of the data base. If the two are not synchronized data corruption is likely to occur. If they are synchronized, updates to one of the servers will be lost. Figure 3 illustrates another bad topology: G. The link between workstation1 and server1 is broken: Workstation1 can access neither server. Workstation2 continues to access both servers. H. The link between server2 and workstation2 is broken: Workstation1 continues to access both servers. Workstation2 can access neither server. I. The link between servers 1 and 2 is broken: Workstation1 continues to access server1. Workstation2 continues to access server2. In cases G and H, the integrity of the data base is maintained. In case I, the two servers will develop different versions of the data base. If the two are not synchronized data corruption is likely to occur. If they are synchronized, updates to one of the servers will be lost. Ring network topologies, such as IBM Token Ring, do not exhibit this design vulnerability. 3.5.2 Wide Area Network Connections If your network has servers that are separated geographically such that a communications link is required, it is, almost by definition, a latent split network. If the link is cut, the users at the separate sites will be updating their respective locally resident servers without reference to the distant servers. The word "almost" is used because it is still possible to put both servers at the same end of the network, that is, at the same site. In this case, if the communications link is broken, the users at the non-servered site will lose access to both servers. Some enterprises can live with this, others can not. Also, in this configuration, any possible performance benefits of cross- mirroring due to nearness will be lost to the remote site. Performance benefits of cross mirroring due to load leveling will still accrue. 3.5.3 Non-Dedicated Servers In some installations one (or both) of the servers is enlisted for double duty - as a server for the network and as a workstation. In this case, the network is split by definition. If the cable attached to the server/workstation fails or is knocked loose, the workstation partition will lose the other server but can still update its resident server. The other workstations will lose the isolated server but can still update the other server. This configuration can be thought of as introducing an additional termination to the topology, with a workstation at the end. 3.5.4 Dealing With The Problem Some subset the of universe of installations will be, for varying reasons, unable to avoid the occurence of a split network. Most, if not all of these reasons will be physical. Most of the members of this subset, however, will not be unduly discommoded by a split network, or at least will find the benefits of mirroring to outweigh the inconveniences brought about by a network which has been split due to partial server inaccessibility. 3.5.4.1 Defining The Problem The danger inherent in a split network is, as described above, that separate groups of clients can be updating separate servers, creating two versions of the data base. 3.5.4.1.1 Lost Updates The updates made to one of the servers will be lost during Recovery. 3.5.4.1.2 Data Base Corruption It is possible, depending on the nature of activity against the data base, that it will be corrupted. 3.5.4.1.3 Incomplete Data Incomplete data is defined as data that is correct in itself, but which is not as rich in information as it could be. Users on different sides of the split may be working without the benefit of updates from the other side. The severity of this situation depends on the nature of the enterprise and the length of time the split persists. In any case, by definition for this discourse, it is considered better to be working with incomplete data than not to be working at all. For example, in a mail advertising campaign, it is better to send advertising copy to prospects recently removed from the data base, or to fail to send to prospects recently added, than to send no mail at all. 3.5.4.1.4 Erroneous Data Erroneus data is defined as data that can lead to inappropriate and damaging action. The damage can be to the data base or directly to the operations of the enterprise it supports. Thus, nearly simultaneous withdrawals from an account at different branch offices of a bank which together, but not separately, exceed the balance of an account can expose the bank to fraud if the branches are on different sides of a split network. Fortunately, there are several ways to completely avoid these dangers. Some of these methods, however, will cause inconvenience. 3.5.4.2 Removing The Danger If you find it impossible to avoid a latent split network, it is prudent to take measures which will guarantee that no danger to your data base or operations exists. The key concept in avoiding danger to the data base is that when the network becomes split, updates to common data must be restricted to only one of the servers. 3.5.4.2.1 Partition by Data In the best of worlds, you will be able to partition your users such that those on one side of the potential split are performing activities against data completely unrelated to those on the other side. Suppose, for instance, that the latent split situation is forcrd on you by the geographic separation of offices, one of which performs CAD/CAM operations related to engineering design, and one of which performs accounting operations in support of day to day operation of the business. In this case, if communications between the two offices, while creating a split network, will not effect operations. Indeed, this situation presents opportunities for performance enhancement through the use of cross mirroring. Take note, however, that in this situation, special Recovery procedures must be followed after re-establishing the link between the two splits. In short, two Recovery jobs must be run, one for each of the partitions, and each in a different direction. 3.5.4.2.2 Partition by Access If your users can be partitioned such that those on one side of the split are only reading the data base, the situation is almost as good as if they were partitioned by data. Thus, if the link is broken, they can continue operations with no danger to the data base, albeit with slightly out of date data. The mail advertising operation can again be used as an example. One site may have the responsibility of updating the data base of addresses while the other site is reading the data base to support their mailings. 3.5.4.2.3 Partition by Expendability If you are not fortunate enough to be able to partition by data or access, you will have to bite the bullet and forcibly restrict access to the data base by one of the sides of the split. Operational measures must be taken to ensure that only one side is updating. The easiest fool-proof way to do this is as follows: o Choose which side is expendable. o Invoke No*STOP NETWORK for the expendable side with message redirection, changing the default response to a drive failure from "DROP AND CONTINUE" to "ABORT". o After they have aborted, you can, depending on your operations, re-institute them as READ-ONLY users, put them into transaction logging mode for delayed updates), put them to other tasks which do not reference the data base, or leave them idle. 3.6 Performance Enhancement Notes Some thought should be given to performance issues when designing your network for mirroring. Although the overhead imposed by No*STOP NETWORK is small, care should be taken during the design phase to minimize it. In some cases No*STOP NETWORK, judiciously employed, can actually improve the performance of your network. The following guidelines all spring from the fact that No*STOP NETWORK does not mirror READs. 3.6.1 Read From The Faster Server If your servers are of unequal speeds, make the faster server your Primary when declaring drive pairs. In this way you will be retrieving data from the most efficient resource. 3.6.2 Bring Your Data Closer There is an opportunity in certain types of installations to significantly improve retrieval times for all or some of the users. If, for instance their work consists of extensive searches of a data base, perhaps correlating data into information via repeated complex queries, this work could be speeded up significantly by doing those queries on local equipment. If the users have sufficient local resources, they can download the data base and supporting files such as indexes to their local hard drive and declare it (or a subdirectory of it) the Primary. In this way, all READs are performed locally without reference to the network. This can speed up response at the workstation while at the same time reducing traffic on the network. An even more dramatic improvement is possible if a workstation RAM drive can be declared the Primary. This approach is, of course, not for all enterprises. Of largest concern is the fact that it creates a latent split network by mimicking a non-dedicated server at each workstation using it. It should also be noted that the users addressing their local resources will not have access to updates performed since their most recent download. If this is not of paramount concern, or if the data being accessed locally is private data, such as an intelligence analyst's "shoebox" files, this approacch can, with careful planning provide large benefits. Bear in mind that the flexibility of No*STOP NETWORK will allow you to define drive pairs such that one or more pairs can support the local activity (e.g., C: to F:), while other pairs can support access to a common data base (e.g., L; to M:), possibly to send updates based on investigation of private files. If the users performing local processing wish to have their private files maintained on two file servers for maximum safety, No*STOP NETWORK-MM can be used to support more than one Secondary server (e.g., C: to F: to G:). 3.6.3 Cross Mirroring to Level the Load If you have purchased a second file server in order to gain the benefits of Level 3 Fault Tolerance, you may be pleasantly surprised to discover that yours may be the type of enterprise which can derive performance benefits from this extra server. If your users can be partitioned according to the data they use, you can cross mirror so that one group of users are doing their READ accesses on one server, while the other half is doing them on the other server, thus providing a form of parallel access to the network for retrieval activity. For example, if you have a user group performing CAD/CAM activities in support of engineering design, and another group of users performing corporate accounting activities, they are more than likely accessing completely disparate data, having no data in common. In this case there is no danger of data corruption, and whereas before one server was doing all the READing, now another server is enlisted to share the load. To accomplish cross mirroring in the example cited, simply turn No*STOP NETWORK on for the engineers, mirroring from, for example, F: to G:, and, for the accountants, from G: to F:.